Dynamic Operand Interchange for Low Power
ثبت نشده
چکیده
Power consumption of an execution unit can be reduced by interchanging its two input operands. A dynamic operand interchange method using a hardware power estimator is proposed. An efficient way to making the estimator is also described. Power reduction for multipliers is up to 29.6%. Introduction: As the need for portable systems increases, low-power design become a very important issue these days. Design methods for low power can be divided into two categories depending on whether additional hardware controlling power is needed or not. Precomputation architecture[1] and guarded evaluation[2] require extra logics to control inputs to combinational logics. Methods in the other category deal with the trade-offs between area/delay and power. One of them is operand interchange[3,4] which is based on the fact that positions of two inputs to an asymmetric execution unit affect power consumption of the unit. In [3], they show that when two sequences, each for left or right operand, of input variables and typical values for the variables are given operand interchange can reduce power consumption. In this letter, we show that operand interchange can also be done dynamically during execution of hardware and reduces power consumption considerably. An architecture controlling interchange is defined and an efficient way to decide the control logic based on the relation between sign bits and power consumption is proposed. Architecture for operand interchange: An operand to an execution unit can to be stored in a register directly connected to the execution unit or in one of the registers in the register file with a mux in-between. In the former case, a modified architecture for operand interchange is shown in Fig. 1. k-th operand values are denoted by I1(k) and I2(k). With I1(k) and I2(k) which are currently evaluated by the execution unit and I1(k+1) and I2(k+1) which will be evaluated at the next clock cycle, a mux control signal Change is generated to determine whether two operands are interchanged or not. Since Change and two operands are connected directly from synchronous elements to muxes, no glitch is produced by mux selection. In the latter case, two additional muxes are needed at the inputs of register file and their outputs become I1(k) and I2(k), respectively. There can be glitches at the input of execution unit due to the delay of preexisting mux at the output of register file. A simple way to avoid them is inserting a buffer at the DFF output. In this architecture, however, inputs to the execution unit can contain glitches because path delays to a mux select and to a mux input are different. Glitches at the inputs increase switching activities in the execution unit drastically. Accordingly, we only concentrate on the former case in this letter. The overhead of this architecture is that area, delay and power are increased due to the estimation logic, one DFF, and two muxes. To reduce the overhead, the estimation logic should be as small as possible compared to the execution unit. An efficient method to determine the estimation logic is described in the following. Deciding estimation logic: Estimation logic for operand interchange is a hardware power estimator. It should determine as correctly as possible whether an interchange reduces power of execution unit from two current inputs and two next inputs. In [3], average Hamming distance between two subsequent inputs is used, but its hardware calculator costs too high. An efficient way is making use of only sign bits. In 2's complement representation, MSB's have the same value as a sign. Fig. 2 shows how average total switching activity in a 16-bit array multiplier changes according to the sign change. We fixed 8 bits as sign bits and generated inputs randomly. ++ → ++ in the x-axis means current sign bits for left and right operands are 0 and 0, and next sign bits are 0 and 0, respectively, and so on. With these results, we can perceive that power of the multiplier is reduced on an average by interchanging next two operand inputs when the four sign bits represent ++ → +− to become ++ → −+. Other such cases are for +− → −+, −+ → +−, and −− → +−. We experimented for all ranges of sign bit length and could see that switching activity can be reduced for the four cases in most ranges(Of course, the experiment cannot cover all conditions). The only exception is occurred for ++ → +− when sign bit length is shorter than 4 bits. Therefore, an estimation logic function f for the array multiplier is as follows: f s k s k s k s k s k s k AM = + + + + + 1 1 1 2 1 1 1 1 2 1 ( ) ( ) ( ) ( ) ( ) ( ) (1) , where s1 and s2 denote MSB's of left and right operands, respectively.In the same way, we derived fBM for a Booth multiplier, which is the same as (1). There is noexception for the Booth multiplier.Experimental results: We experimented with five examples and each of them is synthesized byHYPER[5] with one multiplier allocated. They are converted to gate-level VHDL descriptions andsimulated to measure switching activity. We used 200 real speech data samples as the test input toall the examples except DiffEq which has its own test input. Total switching activity for an arraymultiplier and a Booth multiplier is shown in Table 1. Switching activity of the additional hardwareis negligible compared with that of multipliers(about 0.25% for array and 0.01% for Booth). Whenthe proposed additional circuit is included(dymanic), switching activity is reduced up to 29.6 %.Switching activity is increased only for the Booth multiplier employed in DiffEq. This is becausepower estimation based only on sign bits is not correct enough for the input sequence to thatmultiplier.We also compared the results with them of static operand interchange proposed in [3](static).They use average Hamming distance as a power metric. In seven cases out of ten cases, dynamicway is better. Though the power estimation based on sign bits is less accurate, we can interchangeoperand values at every clock cycle. Furthermore, we do not need to know actual input values tomultipliers.Conclusion: We propose in this letter an architecture for dynamic operand interchange and amethod to determine its power estimation logic based on sign bits. Dynamic operand interchange isbetter than conventional static way in that the interchange can be done at every clock cycle withminimum hardware overhead. We are devising a more efficient estimation logic. References[1] M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthymiou: `Precomputation-basedsequential logic optimization for low power,' IEEE Transactions on VLSI Systems, 1994, 2, (4),pp. 426-436 [2] V. Tiwari, S. Malik, and P. Ashar: `Guarded evaluation: pushing power management to logicsynthesis/design,' Proceedings of International Symposium on Low power Design, 1995, pp.221-226[3] E. Musoll and J. Cortadella: `Scheduling and resource binding for low power,' Proceedings ofInternational Symposium on System Synthesis, 1995, pp. 104-109[4] T-C. Lee, V. Tiwari, S. Malik, and M. Fujita: `Power analysis and minimization techniques forembedded DSP software', IEEE Transactions on VLSI Systems, 1997, 5, (1), pp. 123-135[5] J. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, `Fast prototyping of datapath-intensivearchitecture,' IEEE Transactions on Computer-Aided Design, 1991, 10, (7), pp. 847-860 Taekyoon AhnKiyoung ChoiSchool of Electrical EngineeringSeoul National University, Seoul 151-742 Korea
منابع مشابه
Power Aware Reconfigurable Multiplier for DSP Applications
DSP applications are rich in multiplication operations. Hence there is a growing need in improving the efficiency of multipliers. To improve the performance of multipliers, reconfiguration is introduced. In this paper, reconfiguration is introduced in the form of one level recursive architecture to the existing modified booth multiplier (MBM). It provides reconfigurable modes that satisfy multi...
متن کاملDynamic operand transformation for low-power multiplier-accumulator design
The design of portable battery-operated devices requires low-power computation circuits. This paper presents a new multiplier-accumulator (MAC) design approach, which in contrast to existing methods exploits dynamic operand transformation to reduce power consumption. The key idea is to compare current values of input operands with previous values and depending on computed Hamming distance to us...
متن کاملLow Power Multiplication Algorithm for Switching Activity Reduction through Operand Decomposition
A novel low power multiplication algorithm for reducing switching activity through operand decomposition is proposed. Our experimental results show 12% to 18% reduction in logic transitions in both array multipliers and tree multipliers of 32 bits and 64 bits. Similar results are obtained for dynamic power dissipation after logic synthesis. One additional logic gate is required on the critical ...
متن کاملLow Power Synthesis in Digital Design by Automatic Insertion of Clock Gating and Operand Isolation Cells
This work presents a design and verification of low power and high performance router by using dynamic power reduction technique i.e. Clock gating and Operand isolation. The power consumption of the presented router is significantly lower than that of a router with unnecessary switching activities. The clock gating and operand isolation techniques allows a variety of features such as easily con...
متن کاملSelective Light Vth Hopping (SLITH): Bridging the Gap between Run-Time Dynamic and Leakage Power Reduction
Ever since the invention of various leakage power reduction techniques, leakage and dynamic power reduction techniques are categorized into two separate sets. Most of them cannot be applied together during runtime. The gap between them is due to the large energy breakeven time (EBT) and wakeup time (WUT) of conventional leakage reduction techniques. This paper proposes a new leakage reduction t...
متن کامل